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Abstract 

We consider a generalized version of the correlation clustering problem, defined as follows. Given a 
complete graph G whose edges are labeled with + or —, we wish to partition the graph into clusters 
while trying to avoid errors: + edges between clusters or — edges within clusters. Classically, one seeks 
to minimize the total number of such errors. We introduce a new framework that allows the objective to 
be a more general function of the number of errors at each vertex (for example, we may wish to minimize 
the number of errors at the worst vertex) and provide a rounding algorithm which converts “fractional 
clusterings” into discrete clusterings while causing only a constant-factor blowup in the number of errors 
at each vertex. This rounding algorithm yields constant-factor approximation algorithms for the discrete 
problem under a wide variety of objective functions. 


1 Introduction 

Correlation clustering is a clustering model first introduced by Bansal, Blum, and Chawla [SI IB]. The basic 
form of the model is as follows. We are given a collection of objects and, for some pairs of objects, we 
are given a judgment of whether the objects are similar or dissimilar. This information is represented as a 
labeled graph, with edges labeled -I- or — according to whether the endpoints are similar or dissimilar. Our 
goal is to cluster the graph so that -I- edges tend to be within clusters and — edges tend to go across clusters. 
The number of clusters is not specified in advance; determining the optimal number of clusters is instead 
part of the optimization problem. 

Given a solution clustering, an error is a -I- edge whose endpoints lie in different clusters or a — edge 
whose endpoints lie in the same cluster. In the original formulation of the correlation clustering, the goal is 
to minimize the total number of errors; this formulation of the optimization problem is called MinDisagree. 
Finding an exact optimal solution is NP-hard even when the input graph is complete OIB]. Furthermore, 
if the input graph is allowed to be arbitrary, the best known approximation ratio is O(logn), obtained 
by ini Uni HI]. Assuming the Unique Games Gonjecture of Khot m, no constant-factor approximation for 
MinDisagree on arbitrary graphs is possible; this follows from the results of mun] concerning the minimum 
multicut problem and the connection between correlation clustering and minimum multicut described in 

laiinKii]. 

Since theoretical barriers appear to preclude constant-factor approximations on arbitrary graphs, much 
research has focused on special graph classes such as complete graphs and complete bipartite graphs, which 
are the graph classes we consider here. Ailon, Gharikar, and Newman [an] gave a very simple randomized 
3-approximation algorithm for MinDisagree on complete graphs. This algorithm was derandomized by 
van Zuylen and Williamson [24], and a parallel version of the algorithm was studied by Pan, Papailiopoulos, 
Recht, Ramchandran, and Jordan |20) . More recently, a 2.06-approximation algorithm was announced by 
Chawla, Makarychev, Schramm and Yaroslavtsev |12j . Similar results have been obtained for complete bi¬ 
partite graphs. The first constant approximation algorithm for correlation clustering on complete bipartite 
graphs was described by Amit [I], who gave an 11-approximation algorithm. This ratio was improved by 
Ailon, Avigdor-Elgrabli, Liberty and van Zuylen [I], who obtained a 4-approximation algorithm. Chawla, 
Makarychev, Schramm and Yaroslavtsev |12j announced a 3-approximation algorithm for correlation clus¬ 
tering on complete A:-partite graphs, for arbitrary k, which includes the complete bipartite case. Bipartite 
clustering has also been studied, outside the correlation-clustering context, by Lim, Chen, and Xu [T9] . 
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We depart from the classical correlation-clustering literature by considering a broader class of objective 
functions which also cater to the need of many community-detection applications in machine learning, social 
sciences, recommender systems and bioinformatics [HIllIIH]. The technical details of this class of functions 
can be found in Section [51 As a representative example of this class, we introduce minimax correlation 
clustering. 

In minimax clustering, rather than seeking to minimize the total number of errors, we instead seek to 
minimize the number of errors at the worst-off vertex in the clustering. Put more formally, if for a given 
clustering each vertex v has yv incident edges that are errors, then we wish to find a clustering that minimizes 
max„ yy. 

Minimax clustering, like classical correlation clustering, is NP-hard on complete graphs, as we prove 
in Appendix |C] To design approximation algorithms for minimax clustering, it is necessary to bound the 
growth of errors locally at each vertex when we round from a fractional clustering to a discrete clustering; 
this introduces new difficulties in the design and analysis of our rounding algorithm. These new technical 
difficulties cause the algorithm of w to fail in the minimax context, and there is no obvious way to adapt 
that algorithm to this new context; this phenomenon is explored further in Appendix [A1 

Minimax correlation clustering on graphs is relevant in detecting communities, such as gene, social 
network, or voter communities, in which no antagonists are allowed. Here, an antagonist refers to an entity 
that has properties inconsistent with a large number of members of the community. Alternatively, one 
may view the minimax constraint as enabling individual vertex quality control within the clusters, which is 
relevant in biclustering applications such as collaborative filtering for recommender systems, where minimum 
quality recommendations have to be ensured for each user in a given category. As an illustrative example, one 
may view a complete bipartite graph as a preference model in which nodes on the left represent viewers and 
nodes on the right represent movies. A positive edge between a user and a movie indicates that the viewer 
likes the movie, while a negative edge indicates that they do not like or have not seen the movie. We may be 
interested in finding communities of viewers for the purpose of providing them with joint recommendations. 
Using a minimax objective function here allows us to provide a uniform quality of recommendations, as we 
seek to minimize the number of errors for the user who suffers the most errors. 

A minimax objective function for a graph partitioning problem different from correlation clustering was 
previously studied by [7]. In that paper, the problem under consideration was to split a graph into k roughly- 
equal-sized parts, minimizing the total number of edges leaving any part. Thus, the minimum in [7] is being 
taken over the parts of the solution, rather than minimizing over vertices as we do here. 

Another idea slightly similar to minimax clustering has previously appeared in the literature on fixed- 
parameter tractability of the Cluster Editing problem, which is an equivalent formulation of Correla¬ 
tion Clustering. In particular, Komusiewicz and Uhlmann m proved that the following problem is fixed- 
parameter tractable for the combined parameter {d,t): 

{d, t)-Constrained-Cluster Editing 

Input: A labeled complete graph G, a function r : U(G) ^ {0,... ,t}, and nonnegative integers 
d and k. 

Question: Does G admit a clustering into at most d clusters with at most k errors such that 
every vertex v is incident to at most t{v) errors? 

(Here, we have translated their original formulation into the language of correlation clustering.) Komusiewicz 
and Uhlmann also obtained several NP-hardness results related to this formulation of the problem. While 
their work involves a notion of local errors for correlation clustering, their results are primarily focused on 
fixed-parameter tractability, rather than approximation algorithms, and are therefore largely orthogonal to 
the results of this paper. 

The contributions of this paper are organized as follows. In Section |5J we introduce and formally express 
our framework for the generalized version of correlation clustering, which includes both classical clustering 
and minimax clustering as special cases. In Section |31 we give a rounding algorithm which allows the de¬ 
velopment of constant-factor approximation algorithms for the generalized clustering problem. In Section |T1 
we give a version of this rounding algorithm for complete bipartite graphs. 

In Appendix [^ we discuss minimax clustering in more detail, and show that algorithms similar to the 
Ailon-Charikar-Newman algorithm fail in the minimax context. In Appendix|B]we discuss the approximation 
properties of the MaxAgree formulation of minimax clustering, where the objective is to maximize the 
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number of correct edges, rather than minimize the number of incorrect edges, at the worst vertex. In 
Appendix [C] and Appendix |D] we prove that the minimax correlation clustering problem is NP-hard on 
complete graphs and complete bipartite graphs, respectively. Appendix contains technical details for 
various proofs. 


2 Framework and Formal Definitions 

In this section, we formally set up the framework we will use for our broad class of correlation-clustering 
objective functions. 

Definition 1. Let G be an edge-labeled graph. A discrete clustering (or just a clustering) of G is a partition 
of V (G). A fractional clustering of G is a vector x indexed by (^ 2 *^^) such that Xuv S [0,1] for all uv € ('^ 2 '^^) 
and such that x^z < Xyw + x^z for all distinct v,w, z €V (G). 

If is a fractional clustering, we can view Xuv as a “distance” from u to v; the constraints Xyz < Xvw + Xwz 
are therefore referred to as triangle inequality constraints. We also adopt the convention that Xuu = 0 for 
all u. 

In the special case where all coordinates of x are 0 or 1, the triangle inequality constraints guarantee that 
the relation defined by rt ~ iff Xuv = 0 is an equivalence relation. Such a vector x can therefore naturally 
be viewed as a discrete clustering, where the clusters are the equivalence classes under By viewing a 
discrete clustering as a fractional clustering with integer coordinates, we see that fractional clusterings are 
a continuous relaxation of discrete clusterings, which justifies the name. This gives a natural notion of the 
total weight of errors at a given vertex. 

Definition 2. Let G be an edge-labeled complete graph, and let a; be a fractional clustering of G. The 
error vector of x with respect to G, written err(a;), is a real vector indexed by V{G) whose coordinates are 
defined by 

err(x)^ - ^ ( Xyyy ^ ( (1 Xyyj). 

w£N+(v) wGN-{v) 

If C is a clustering of G and a;*' is the natural associated fractional clustering, we define err(C) as err(a;‘'). 

We are now prepared to formally state the optimization problem we wish to solve. Let denote the set 
of vectors in M" with all coordinates nonnegative. Our problem is parameterized by a function / : ]R”q R. 

/-Correlation Clustering 
Input: A labeled graph G. 

Output: A clustering C of G. 

Objective: Minimize /(err(C)). 

In order to approximate /-Correlation Clustering, we introduce a relaxed version of the problem. 

Fractional /-Correlation Clustering 
Input: A labeled graph G. 

Output: A fractional clustering x of G. 

Objective: Minimize /(err(a;)). 

If / is convex on R>q, then using standard techniques from convex optimization [5], the Fractional /- 
Correlation Clustering problem can be approximately solved in polynomial time, as the composite function 
/ o err is convex and the constraints defining a fractional clustering are linear inequalities in the variables 
Xe- When G is a complete graph, we then employ a rounding algorithm based on the algorithm of Charikar, 
Guruswami, and Wirth [SI HU] to transform the fractional clustering into a discrete clustering. Under rather 
modest conditions on /, we are able to obtain a constant-factor bound on the error growth, that is, we can 
produce a clustering C such that /(err(C)) < c/(err(a;)), where c is a constant not depending on / or x. In 
particular, we require the following assumptions on /. 

Assumption A. We assume that / : R>q —R has the following properties. 
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(V f{^y) ^ cf{y) for all c>0 and all y G K”, and 

(2) Ify,zGWf^ are vectors with yi < Zi for all i, then f{y) < f{z). 

Under AssumptionIXl the claim that /(err(C)) < c/(err(a;)) follows if we can show that err(C)„ < ceTi{x)v 
for every vertex v G V{G). This is the property we prove for our rounding algorithms. 

We will slightly abuse terminology by referring to the constant c as an approximation ratio for the 
rounding algorithm; this notation is motivated by the fact that when / is linear, the Fractional /-Correlation 
Clustering problem can be solved exactly in polynomial time, and applying a rounding algorithm with 
constant c to the fractional solution yields a c-approximation algorithm to the (discrete) /-Correlation 
Clustering problem. In contrast, when / is nonlinear, we may only be able to obtain a (1 -I- e)-approximation 
for the Fractional /-Correlation Clustering problem, in which case applying the rounding algorithm yields a 
c{l + e)-approximation algorithm for the discrete problem. 

A natural class of convex objective functions obeying Assumption is the class of norms. For all 
p > 1, the £P-norm on R" is defined by 


/ n \ i/p 

^ |a;*rj 

As p grows larger, the iP-norm puts more emphasis on the coordinates with larger absolute value. This 
justifies that definition of the .^°“-norm as 

£°°(x) = max{a;i,..., Xn}- 

Classical correlation clustering is the case of /-Correlation Clustering where f{x) = ^£^{x), while minimax 
correlation clustering is the case of /-Correlation Clustering where f{x) = £°^{x). 

Our emphasis on convex / is due to the fact that convex programming techniques allow the Fractional 
/-Correlation Clustering problem to be approximately solved in polynomial time when / is convex. However, 
the correctness of our rounding algorithm does not depend on the convexity of /, only on the properties listed 
in Assumption 13 If / is nonconvex and obeys Assumption |3 and we produce a “good” fractional clustering 
X by some means, then our algorithm still produces a discrete clustering C with /(err(C)) < c/(err(a;)). 

3 A Rounding Algorithm for Complete Graphs 

We now describe a rounding algorithm to transform an arbitrary fractional clustering a: of a labeled complete 
graph G into a clustering C such that err(C)t, < cerr(x)^ for all v G V{G). 

Our rounding algorithm is based on the algorithm of Charikar, Guruswami, and Wirth B [To] and is 
shown in Algorithm [T| The main difference between Algorithm [T] and the algorithm of (Oj [10] is the new 
strategy of choosing a pivot vertex that maximizes |T*|; in [9l|T0], the pivot vertex is chosen arbitrarily. 
Furthermore, the algorithm of [gdo] always uses a = 1/2 as a cutoff for forming “candidate clusters”, while 
we express a as a parameter which we later choose in order to optimize the approximation ratio. 

Under the classical objective function, an optimal fractional clustering is the solution to a linear program, 
which motivates the following notation for the more general case. 

Definition 3. If uv is an edge of a labeled graph G, we define the LP-cost of uv relative to a fractional 
clustering x to be Xuv if uv G A+, and 1 — Xuv if uv G E~. Likewise, the cluster-cost of an edge uv is 1 if 
uv is an error in the clustering produced by Algorithm [I] and 0 otherwise. 

Our general strategy for obtaining the constant-factor error bound for Algorithm |T] is similar to that of 
[siiinj. Each time a cluster is output, we pay for the cluster-cost of the errors incurred by “charging” the 
cost of these errors to the LP-costs of the fractional clustering. The main difference between our proof and 
the proof of lano] is that we must pay for errors locally: for each vertex u, we must pay for all clustering 
errors incident to v by charging to the LP cost incident to v. In particular, every clustering error must now 
be paid for at each of its endpoints, while in liiin], it was enough to pay for each clustering error at one of 
its endpoints. For edges which cross between a cluster and its complement, this requires a different analysis 
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Algorithm 1 Round fractional clustering x to obtain a discrete clustering, using threshold parameters a, 7 
with 0 < 7 < a < 1 / 2 . 

Let S = V{G). 

while S' ^ 0 do 

For each u £ S ,\et = {w € S — {it}: Xuw < ol} and let T* = (w S S — {it}: Xuw < 7 }- 
Choose a pivot vertex u € S that maximizes |r*|. 

Let T — T„. 

if J2weT > a \T\ /2 then 

Output the cluster ju}. (Type 1 cluster} 

Let S = S — {u}. 
else 

Output the cluster {u} U T. (Type 2 cluster} 

Let S = S — ({u} U T). 

end if 
end while 


at each endpoint, a difficulty which was not present in [siiin]- Our proof emphasizes the solutions to these 
new technical problems; the parts of the proof that are technically nontrivial but follow earlier work are 
omitted due to space constraints but can be found in Appendix lEl 

Observation 4. Let x be a fractional clustering of a graph G, and let w,z G V{G). For any vertex u, we 
have x^z ^ Xuz Xuw and 1 Xwz ^1 Xuz Xuw • 

Theorem 5. Let G be a labeled complete graph, let a and 7 be parameters with 0 < 7 < a < 1/2, and let 
X be any fractional clustering ofG. If C is the clustering produced by Algorithm\^with the given input, then 
for all V G V(G) we have err(C)„ < cerr(a:)„, where c is a constant depending only on a and 7 . 

Proof. Let ki,k 2 , fca be constants to be determined, with 1/2 < fci < 1 and 0 < 2^2 < k^ < 1/2. Also assume 
that kia > 7 and that k 2 a < 1 — 2 a. 

To prove the approximation ratio, we consider the cluster-costs incurred as each cluster is output, splitting 
into cases according to the type of cluster. In our analysis, as the algorithm runs, we will mark certain 
vertices as “safe”, representing the fact that some possible future clustering costs have been paid for in 
advance. Initially, no vertex is marked as safe. 

Case 1 : A Type 1 cluster is output. Let X = SCi N~^(u), with S as in Algorithm[TJ The new cluster-cost 
incurred at u is |A|, and for each v G X, a, new cluster-cost of 1 is incurred at v. 

First we pay for the new cluster cost incurred at u. For each edge uv with v G T, we have Xuv < a and 
so 1 — Xuv > 1 — a > Xuv Thus, the total LP cost of edges uv with S T is at least which is 

at least a \T\ /2 since {u} is output as a Type 1 cluster. Thus, charging each edge uv with v GT a total of 
2/a times its LP-cost pays for the cluster-cost of any positive edges from u to T. On the other hand, if uv 
is a positive edge with v G S — T, then since v ^ T, we have Xuv > ct. Hence, the LP-cost of uv is at least 
a, and charging 1/a times the LP-cost of uv pays for the cluster-cost of this edge. 

Now let u S A; we must pay for the new cluster cost at v. If Xw > ^ 2 ^, then the edge uv already incurs 
LP cost at least k 2 a, so the new cost at v is only l/{k 2 a) times the LP-cost of the edge uv. So assume 
Xuv < k 2 <y.. In this case, we say that u is a bad pivot for v. 

First suppose that v is not safe (as is initially the case). We will make a single charge to the edges 
incident to v that is large enough to pay for both the edge uv and for all possible future bad pivots, and 
then we will mark v as safe to indicate that we have done this. The basic idea is that if v has many possible 
bad pivots, then since Xuv is “small”, all of these possible bad pivots are also close to u, thus included in 
Tu. Since 'YhweTu ^ there is a large set B C Tu oi vertices that are “moderately far” from u, 

and therefore moderately far from v. The number of these vertices grows with the number of bad pivots, so 
charging all the edges vz ioi z G B is sufficient to pay for all bad pivots. 

We now make this argument rigorous. Let Pv be the set of potential bad pivots for v, defined by 

Pv = {p G S: Xyp < k2a}. 
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Note that u G Py. Since ^2 < 1/4, we have Xup < Xuv + Xyp < a 12 for all p G Py] hence Py C T. Define the 
vertex set B by 

B = {z GT: Xuz > ksa}. 

Since Xuz < a for all z G T, we see that 

Xuz < ha \T - B\ + a\B\. 

zeT 

On the other hand, since {m} is output as a Type 1 cluster, we have 

'^Xuz > a \T\ /2. 

zeT 

Combining these inequalities and rearranging, we obtain \B\ > (1 — 2 ^ 3 ) |T — B\. For each vertex z G B, we 
have Xyz > Xuz — Xyy > (h — h)a\ in particular, since h > 2 ^ 2 , we have Xyz > ha, so that z ^ Py. Hence 
\T — B\ > \Py\, and we have \B\ > (1 — 2h) \Pv\- 

On the other hand, for z G i? we also have 1 — Xyz > 1 — Xyy — Xuz > 1 — (1 + k 2 )a. It follows that 
each edge vz for z G H has LP-cost at least niin((fc 3 — k 2 )a, 1 — (1 + k 2 )a), independent of whether vz is 
positive or negative. It is easy to check that since a < 1/2 and ^3 < 1, this minimum is always achieved 
by (^3 “ k 2 )a. Therefore, we can pay for the (possible) Type-1-cluster cost of all edges vp for p G Py hy 
charging each edge vz with z G B a total of 

I 

(1 - 2h){k^ - k2)a 

times its LP-cost. We make all these charges when the cluster {u} is created and put them in a “bank 
account” to pay for later Type-1-cluster costs for v. Then we mark v as safe. The total charge in the bank 
account is at least |P„|, which is enough to pay for all bad pivots for v. 

We have just described the case where u is a bad pivot and v is not safe. On the other hand, if u is a 
bad pivot and v is safe, then v already has a bank account large enough to pay for all its bad pivots, and we 
simply charge 1 to the account to pay for the edge uv. 

Case 2: A Type 2 duster {u} U T is output. The negative edges within {u} U T are easy to pay for: if 
vw if a negative edge inside {«} U T, then we have 1 — Xyyy > 1 — Xyy — Xuw > 1 — 2a, so we can pay for each 
of these edges by charging a factor of times its LP-cost. 

Thus, we consider edges joining {u} U T with S — ({u} U T). We call these edges cross-edges for their 
endpoints. A standard argument (see Appendix lEl) shows that for z G S' — ({u} U T), the total cluster-cost 
of the cross-edges for z is at most max{l/(l — 2a), 2/a} times the LP-cost of those edges, so the vertices 
outside {u}UT can be dealt with easily. 

However, we also must bound the cluster-cost at vertices inside {u} U T. This is where we use the 
maximality of \T*\. 

Let w G {u} U T. First consider the positive cross-edges wz such that Xwz > 7 - Any such edge has 
cluster-cost 1 and already has LP-cost at least 7 , so charging I /7 times the LP-cost to such an edge pays for 
its cluster cost. Now let A = {z G S' — ({u} U T) : Xy,z < 7 }; we still must pay for the edges wz with z G A. 

If Xuw < kia, which includes the case u = w, then for all z G A, we have Xwz > Xuz — Xuw > a — kia = 
(1 — ki)a. Hence, for any positive edge wz with z G X, the LP-cost of wz is at least (I — ki)a, and so 
the cluster cost of the edge wz is at most I/((I — ki)a) times the LP cost. Charging this factor to each 
cross-edge pays for the cluster-cost of each cross-edge. 

Now suppose Xuw > kia. Since fcia > 7 , this implies w ^ T*. In this case, it is possible that w may have 
many positive neighbors z G A for which Xwz is quite small, so we cannot necessarily pay for the cluster-cost 
of the edges joining w and A by using their LP-cost. Instead, we charge their cluster-cost to the LP-cost of 
edges within T. 

Observe that A C T*, and hence \T*\ > |A|. By the maximality of |T*|, this implies that \T*\ > |A|. 
Now for any v G T*, we have the following bounds: 

Xwv ^ Xuw Xuy P kia y, 

1 Xwy ^1 Xuw Xuv ^ 1 a 7 * 
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Since a < 1/2 and fci < 1, we have fcia < a < 1 — a, so these lower bounds imply that each edge wv with 

V € T* has LP-cost at least kia — independent of whether wv is a positive or negative edge. Thus, the 
total LP cost of edges joining w to T* is at least {kia — j) \T*\. 

Since the total cluster-cost of edges joining w and X is at most \X\ and since \T*\ > |X|, we can pay for 
these edges by charging each edge wv with v £ T* a factor of times its LP-cost. 

Having paid for all cluster-costs, we now look at the total charge accrued at each vertex. Fix any vertex 

V and an edge vw incident to v. We bound the total amount charged to vw by v in terms of the LP-cost of 
vw. There are three distinct possibilities for the edge vw: either vw ended inside a cluster, or v was clustered 
before w, or w was clustered before v. 

Case 1: vw ended within a cluster. In this case, v may have made the following charges: 

• A charge of (i_ 2 fc 3 )(\ 3 -fc 2 )a times the LP-cost, to pay for a “bank account” for v, 

• A charge of times the LP-cost, to pay for vw itself if vw is a negative edge, 

• A charge of times the LP-cost, to pay for positive edges leaving the v-cluster. 

Thus, in this case the total cost charged to vw by v is at most ci times the LP-cost of vw, where 

_ 1 11 
^ (1 — 2k^){k^ — k 2 )a 1 — 2 a kia — 7 

Case 2: v was clustered before w. In this case, v may have made the following charges: 

• A charge of ^ 2 - 2 fc 3 )(\ 3 -fc 2 )a LP-cost, to pay for a “bank account” for v, 

• A charge of at most ^ times the LP-cost, to pay for all cross-edges if v was output as a Type 1 cluster, 

• A charge of at most max times the LP-cost, to pay for vw if v was output in a Type 2 

cluster. 

Note that ki > 1/2 implies that so we may disregard the case where v is output as a Type 1 

cluster. Thus, in this case the total cost charged to vw by v is at most C 2 times the LP-cost of vw, where 

(1 - 2 fc 3 )(fc 3 - feja ^ {(l-fci)a’ 7 } 

Case 3: w was clustered before v. In this case, v may have made the following charges: 

• A charge of at most (i_ 2 fc 3 )(fc 3 -fc 2 )a times the LP-cost, to pay for a “bank account” for v, 

• A charge of at most times the LP-cost, to pay for the cluster-cost of vw if vw is a positive edge 

and w was output as a Type 1 cluster, 

• A charge of at most 

/ 1 

max <-, — > 

\l- 2 a’ aj 

times the LP-cost, to pay for vw if w was output in a Type 2 cluster. 

Clearly vw cannot receive both the second and third types of charge. Furthermore, since ^2 < 1/4, we have 
Since k 2 a < 1 — 2a, we see that is the largest charge that vw could receive from either the 
second or third type of charge. Thus, in this case the total cost charged to vw by v is at most C 3 times the 
LP-cost, where 

1 1 
^ (1 — 2^3)(/cs — feja k 2 a. 

Thus, the approximation ratio of the algorithm is at most max{ci, C 2 , C 3 }. We wish to choose the various 
parameters to make this ratio as small as possible, subject to the various assumptions on the parameters 
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required for the correctness of the proof. It seems difficult to obtain an exact solution to this optimization 
problem. Solving the problem numerically, we obtained the following values for the parameters: 


a = 0.465744 7 = 0.0887449 

ki = 0.767566 fca = 0.117219 ^3 = 0.308433. 

These parameters yield an approximation ratio of roughly 48. □ 


4 A Rounding Algorithm for One-Sided Biclustering 

In this section, we consider a version of the /-Correlation Clustering problem on complete bipartite graphs. 
Let G be a complete bipartite graph with edges labeled -I- and —, and let Vi and V 2 be its partite sets. We 
will obtain a rounding algorithm that transforms any fractional clustering x into a discrete clustering C such 
that err(C)„ < cerr(a;)„ for all n S Vi. Our algorithm is shown in Algorithm [5] 

Our algorithm does not guarantee any upper bound on err(C)„ for v G ¥ 2 '. as the algorithm treats the 
sides Vi and V 2 asymmetrically, it is difficult to control the per-vertex error at ¥ 2 - Nevertheless, an error 
guarantee for the vertices in Vi suffices for some applications. Our approach is motivated by applications in 
recommender systems, where vertices in Vi correspond to users, while vertices in ¥2 correspond to objects 
to be ranked. In this context, quality of service conditions only need to be imposed for users, and not for 
objects. 


Algorithm 2 Round fractional clustering to obtain a discrete clustering, using threshold parameters a, 7 
with a < 1/2 and 7 < a. 

Let S = ¥{G). 
while ¥i Ci S ^ % do 

For each rt S Vi fl S', let = {u> £ S' — {m} : Xuw < cx} and let T* = {w € ¥2 Ci S: Xuw < 7}- 
Choose a pivot vertex u £ ¥1 H S that maximizes \T*\. 

Let T = T„. 

if Y,w&V 2 nT > a IR 2 n T\ /2 then 

Output the singleton cluster {m}. {Type 1 cluster} 

Let S = S — {u}. 
else 

Output the cluster juj U T. (Type 2 cluster} 

Let S = S — ({u} U T). 

end if 
end while 

Output each remaining vertex of V 2 O S as a singleton cluster. 


Theorem 6. Let G he a labeled complete bipartite graph with partite sets ¥i and ¥ 2 , let a,"f be parameters 
as described in Algorithm\^ and let x be any fractional clustering of G. If C is the clustering produced by 
Algorithm\^ with the given input, then for all v £ ¥i we have err(C)„ < cerr(a:)t,, where c is a constant 
depending only on a and 7 . 

We note that the proof of Theorem | 6 ] is actually simpler than the proof of Theorem [5l because the focus 
on errors only at Vi eliminates the need for the “bad pivots” argument used in Theorem [51 This also leads 
to a smaller value of c in Theorem | 6 ] than we were able to obtain in Theorem [5] 

Proof As before, we make charges to pay for the new cluster costs at each vertex of Vi as each cluster is 
output, splitting into cases according to the type of cluster. Let ki be a constant to be determined, with 
kia > 7 . 

Case 1: A Type 1 cluster {it} is output. In this case, the only cluster costs incurred are the positive 
edges incident to it, all of which have their other endpoint in V 2 - The averaging argument used in Case 1 of 
Section [3] shows that charging every edge incident to it a factor of 2/a times its LP cost pays for the cluster 
cost of all such edges. 





Case 2: A Type 2 duster {u} UT is output. Negative edges within the cluster are easy to pay for: if 
W 1 W 2 is a negative edge within the cluster, with Wi € Vi, then we have 

1 ^WiW2 — 1 ^UWi ^UW2 — ^ 2 o:, 

SO we can pay for the cluster-cost of such an edge by charging it a factor of 1/(1 — 2a) times its LP-cost. 

We still must pay for positive edges joining the cluster with the rest of S'; we call such edges cross-edges. 
Each such edge must be paid for at its endpoint in Vi. 

If z e El is a vertex outside the cluster, then a standard argument (see Appendix [E]) shows that the 
cross-edges for z can be paid for by charging each such edge a factor of max{l/(l — 2a), 2/a)} times its LP 
cost. 

Now let w G Vi be a vertex inside the cluster. We must pay for the cross-edges incident to w using the 
LP-cost of the edges incident to w. First consider the positive edges from w to vertices z outside the cluster 
such that Xwz > 7 . Any such edge has cluster-cost 1 and LP-cost at least 7 , so charging each such edge a 
factor of 1/7 times its LP-cost pays for its cluster cost. Let A = {z G (S' fl V 2 ) — T: x^z < 7 }; we must pay 
for the edges wz with z G A. Note that Xuz > ct for all z G A, since z G A implies z . 

If Xuw < kia, then for all z G A, we have 

^wz ^ ^uz ^uw ^ (I k±)a. 

Hence, for any positive cross-edge wz with z G A, the LP-cost of wz is at least (I — ki)a, and so we can pay 
for the cluster-cost of wz by charging wz a factor of times its LP-cost. 

Now suppose Xuw > k^a. As before, we pay for the cross-edges by charging the edges inside the cluster. 
Observe that |T/)| > |A|. Since u was chosen to maximize \T*\, this implies that |T*| > |A|. For any v G T*, 
we have 

Xwv ^ ^uw ^uv ^ k^a 7 ' 

On the other hand, for any v £ T* we also have 

I - Xwv > I - Xuw - Xuv > ^ - a - "f > a - "f. 

Since fci < 1, it follows that the edge wv has LP-cost at least kia — j independent of whether wv is positive 
or negative. Thus, the total LP cost of edges joining w to T* is at least (fcia — 7 ) |T*|. 

Since the total cluster-cost of the cross- edges joining w and A is at most |A| and since |T/| > |A|, we 
can pay for the cross-edges by charging each edge wv with w G T* a factor of times its LP-cost. 

Having paid for all cluster-costs, we now look at the total charge accrued at each vertex. Fix a vertex 
V £ Vi and an edge vw incident to v. We bound the total amount charged to vw by v in terms of the 
LP-cost of vw. There are three distinct possibilities for the edge vw: either vw ended inside a cluster, or v 
was clustered before w, or w was clustered before v. 

Case 1: vw ended within a cluster. In this case, v may have made the following charges: 

• A charge of at most yrja the LP cost, to pay for vw itself if vw is a negative edge, 

• A charge of times the LP-cost, to pay for positive edges leaving the u-cluster. 

Thus, in this case the total cost charged to vw by v is at most ci times the LP-cost of vw, where 

c - ^ + ^ 

I — 2a kia — 7 

Case 2: v was clustered before w. In this case, v may have made the following charges: 

• A charge of 2/a times the LP cost, to pay for vw if v was output as a singleton, 

• A charge of maxj-p^^h^, ij times the LP cost, to pay for vw if v was output in a nonsingleton cluster. 
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Since v makes at most one of the charges above, the total cost charged to vw by v is at most C 2 times the 
LP-cost of vw, where 

f 1 12) 

C2=max)-- 

l[l-ki)a 7 aj 

Case 3: w was clustered before v. In this case, v may have made the following charges: 

• A charge of at most maxlj^^, times the LP cost, to pay for cross-edges at r; if ic is output in a 
nonsingleton cluster. 

Thus, in this case the total cost charged to vw by v is at most a times the LP-cost of vw, where 

The approximation ratio is maxjci, 02 , 03 }. Numerically, we obtain an approximation ratio of at most 10 by 
taking the following parameter values: 

0 = 0.377 7 = 0.102 fci = 0.730 □ 
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A Minimax Clustering and the Failure of Pivoting Algorithms 

In this appendix, we consider minimax clustering^ which is the special case of /-Correlation Clustering where 
f{y) = max„gy(( 3 ) yy. Thus, in minimax clustering, we seek to minimize the number of errors at the worst 
vertex in the clustering. Equivalently, we are trying to minimize the £°“-norm of the error vector, in contrast 
to classical correlation clustering, where we are trying to minimize the £^-norm. 

Minimax clustering is a representative example of the difficulties which arise in moving from classical cor¬ 
relation clustering to the more general /-Correlation Clustering problem. We will show that some techniques 
which work well for the classical correlation clustering problem break down in the minimax context. 

Ailon, Charikar, and Newman [nia gave a beautifully simple randomized 3-approximation algorithm 
for classical correlation clustering on complete graphs. Their algorithm is shown in Algorithm |31 Since 
our rounding clustering in Section [3] is based on the Charikar-Guruswami-Wirth algorithm with a modified 
pivoting rule, it is natural to ask whether a similar modification to the Ailon-Charikar-Newman algorithm 
also yields a constant-factor approximation algorithm for minimax clustering. 


Algorithm 3 Ailon-Charikar-Newman algorithm mi. 
Let S = V{G). 

while S ^ % do 

Pick V G S uniformly at random. 

Let T = ({?;} U N+{v)) C S. 

Output the cluster T. 

Let S' = S' - T. 

end while 


Unfortunately, it seems that there are severe obstacles to modifying the ACN algorithm in this manner. 
For any positive integer t, let Mt be a graph on 2t vertices consisting of t pairwise disjoint edges, and let Gt 
be the labeling of K^t in which the edges of Mt are labeled — and all other edges are labeled +. 

Clearly, if all vertices of G* are placed in the same cluster (the “giant clustering”), then there is only 1 
error at each vertex of G*. We show that all other clusterings of G* have many more errors at some vertex. 

Lemma 7. If C is a clustering of Gt with more than 1 cluster, then some vertex of Gt has at least t — 1 
errors in C. 

Proof. Let X be the smallest cluster in C. Since C has at least 2 clusters, we have |A| < t. For any v G X, 
there is at most one w ^ X such that vw is a negative edge. Hence, each v G X has at least t — 1 incident 
errors. □ 

By Lemma 0 any constant-factor randomized algorithm for minimax clustering must return the giant 
clustering for Gt with probability 1 — 0{l/t). On the other hand, if we modify Algorithm [3] by changing 
the rule for choosing the pivot vertex v, the resulting algorithm still cannot produce the giant clustering. It 
is difficult to see how Algorithm [3] could sensibly be modified in order to return the giant clustering for Gt 
with high enough probability. 

We now consider the behavior of Algorithm [T] on the graph Gt- While the minimax objective function is 
not linear in the variables Xuv, we can still model the /-Fractional Correlation Clustering problem using the 
linear program L shown in Figure [TJ 

Since the algorithm presented in Section [3] yields a constant-factor approximation algorithm for minimax 
clustering, and since every clustering of Gt other than the giant clustering has t — 1 errors at some vertex, 
it is necessary that our rounding algorithm, applied to an optimal solution of L, returns the giant clustering 
for all sufficiently large t. This follows immediately from the following result. 

Proposition 8. Let L be the linear program shown in Figure[^ as formulated for Gt- If t > S, then the 
unigue optimal solution to L has Xuv = 0 for all uv G E{G). 

Proof. The dual program to L is shown in Figure [21 with the following variables: 

• For each v G V(Gt), a variable TVy corresponding to the constraint J2w&n+(v) + 12wGN-{v)i^ ~ 

Xyw^ if: Ml, 
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minimize M, subject to: 

^UV ^ ^UZ “ 1 “ '^zv 


E 




w£N^{v) 


(1 - < M 

w€.N~ (v) 

0<Xe<l 

M € K 


(for all distinct u, v, z) 
(for all V e V{G)) 

(for all e e E{G)) 


Figure 1: LP formulation L of /-Fractional Correlation Clustering, where f{y) = max„gy(( 3 ) yy 

maximize d~{v)TTy, subject to: 

vGV(G) 

—TTu — TTy + <7u,y < 0 (for all uv G E'^{G)) 

TTu+Tfy + du,y < 0 (for all uv G E~{G)) 

y^TT^ < 1 

V 

T^z,cru,v > 0 (for all z G V{G) and all u,v G E{G)). 
Figure 2: Dual of L. 


• For each ordered triple (u,v,z) where u,v,z are distinct vertices of V{Gt), a variable <J(u,y,z) corre¬ 
sponding to the constraint Xuv < Xuz + Xzy ■ 

For convenience of notation, we also introduce the abbreviation Uu,v to stand for '^z^v{G)-{u “ 

(^v,u,z + <^z,u,v + CTzjti.u + <yu,z,v + <yy,z,u)- Observe that there are exactly 2t — 2 choices of z to sum over. 

Now we define a dual solution. Let u'u" be an edge of the negative matching. Consider the dual solution 
defined below: 


'^u' — — t/2, 

TTy = 0 for all V ^ {u', u"}, 


(^u',u",z = l/( 2 t - 2 ) for all z ^ {u',u"}, 
cru,v,z = 0 if {u,v) z/z {u',u"). 


Clearly this solution has an objective value of 1; we check that it is feasible for t > 2. If uv is an edge 
containing neither of {u',u"}, then 7 r„ = = 0 and au,v = 0, since every term of au,v is 0. The edge u'u" 

is a negative edge with tt^' = tt^" = 1 / 2 , and after eliminating all the zero terms, we have 


zGV(G)-{u,v} 


(- 


^u' ,u" ,z ) 


E 

zGV{G)-{u,v} 


1 

2t-2 


= - 1 . 


Thus, TTyi + 7 r„// -I- du>,u>> < 0, as required. Finally, if uv is a positive edge with u G {u',u"}, say if u = u', 
then the only nonzero term of duv is (Ju,u",v, and we have —7r„ — 7r„ -f (Tu,v = —1/2 -|- l/(2t — 2) < 0 as 
required. The same argument holds if u = u". 

Since this solution has an objective value of 1, matching the primal objective when Xyy = 0 everywhere, it 
is clearly optimal. Furthermore, if f > 3, then for each positive edge incident to u' or u", there is slack in the 
corresponding constraint of the dual problem. By complementary slackness, this implies that in any optimal 
solution to L, we have Xy'y = Xy"y = 0 for all v G V{G) — {u',u"}. The triangle inequality constraints in L 
then imply that in an optimal primal solution, Xyy = 0 for all uv G E{G). □ 


B MaxAgree for Classical and Minimax Clustering 

In this paper, we have mainly focused on studying the MinDisagree formulation of /-Correlation Clustering, 
where we seek to minimize an objective function related to the clustering errors in a candidate solution, and 
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where a c-approximation algorithm is an algorithm whose total error weight is at most c times the optimal 
weight. 

An alternative formulation to MinDisagree is MaxAgree, where we instead seek to maximize some 
function related to the edges that are not errors. In classical correlation clustering, this means that we 
want to maximize the number of edges which are correct. In minimax clustering, we wish to maximize the 
number of correct edges at the vertex with the fewest correct edges. In both cases, an optimal solution to 
MinDisagree is also an optimal solution to MaxAgree, but their approximation properties differ. 

In the classical case, there is a trivial 2-approximation algorithm for MaxAgree on arbitrary graphs: 
we can simply choose the better of clustering with all vertices in separate clusters and the clustering with all 
vertices in the same cluster. All negative edges are correct in the first clustering and all positive edges are 
correct in the second clustering, so taking the better of the two yields a clustering with at least half the edges 
correct, which is clearly at least half the value of an optimal clustering. Less trivially, Bansal, Blum, and 
Chawla [316] gave a PTAS for MaxAgree, so that any approximation ratio greater than 1 is achievable. 
In contrast, the best approximation ratio known for MinDisagree on arbitrary graphs has a ratio of logn. 

It is natural to ask whether some algorithm can also be found to approximate MaxAgree in the minimax 
context. The trivial 2-approximation algorithm no longer works, since if G both has vertices of high positive 
degree and high negative degree, then each of the “extreme” clusterings will cause a large number of errors at 
some vertex. We have not been able to find any constant-factor approximation algorithm for the MaxAgree 
formulation of minimax clustering, even with the additional assumption that G is a labeled complete graph. 

We now construct a graph which seems to be a good example of the difficulties in designing an algorithm 
for this problem. For any n, let G„ be the complete graph on n-fl vertices, and fix some vertex u* € V(Gn)- 
All edges incident to u* are labeled +, while all other edges are labeled —. Thus, u* has positive degree n, 
while all other vertices have positive degree 1. 

It is clear that only one type of integer clustering could be optimal: cluster u* with some number t of 
the remaining vertices, and cluster all other vertices as singletons. This yields t correct edges at u*, n — t + 1 
correct edges at each vertex clustered with u*, and n — 1 correct edges at each singleton vertex. Thus, the 
optimal clustering has [{n + 1)/2J correct edges at its worst vertex. 

The following result demonstrates why algorithms based on LP rounding are likely to have trouble finding 
a good clustering of G„ under the MaxAgree objective. We reuse the LP formulation of MinDisagree 
shown in Figure [TJ this is valid because when we seek an exact solution, minimizing M in Figure |T] is 
equivalent to maximizing |P(G)| — 1 — M, the weight of the correct edges at the worst vertex. 

Proposition 9. Let L be the linear program shown in FigureUl formulated for Gn- If n >2, then the 
unique optimal solution to L has Xu*v = 1/3 for all v ^ u* and Xy^j = 2/3 for all vw € E(Gn — u*). 

Proof. In the proposed solution, we have M = n/3. To show that this solution is optimal and unique, we 
construct a solution to the dual program shown in Figured as in the proof of Proposition |3 Consider the 
dual solution defined by 

_ 1 ^ 

'Xu* I Q / \ ; 

3(n — 1) 

^ for all V 7 ^ w*, 

3(n — 1) 

Since d~{u*) = 0 and d~{v) = n — 1 for all v ^ u*, the objective value of this solution is n/3. Thus, if this 
solution is feasible, then it is optimal. 

To see that this solution is feasible, we observe that for v,w ^ u*, we have dy ^yj = —Oy^yy^u* — <Xw,v,u* = 
— {■Ky +'Kw), SO that TTy F-Kyy + cfy^yy < 0 for all uegativo edges vw, as needed. On the other hand, for v ^ u* 
we have 

du* ,v — ^ ^ {eTy^z^u* eTz,y,u*^ — 2(n l)7r^. 

z^{u* ,p} 

Since 7r„* = 1 — nxy, this implies that 

-piu* -Xy F dy,^y = -(l - wKy) - "Ky P 2 {u — 1) TT^ = (u “ 1)71^ ” 1 -f 2(n - l)7r^ = 0 , 


1 


-"VfWfU- r,/ 

6{n — 1) 

— 0 a z ^ u . 


for all vw € E{Gn — u*) 
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so that —TTu* — + ^u-,v < 0 for all positive edges u*v, as needed. Since also ~ 

proposed dual solution is feasible, so the given primal solution is optimal. 

Now we argue that the given primal solution is the unique optimal solution. Let x be any optimal primal 
solution. For each edge vw S E{Gn — w*), the dual variable (Jv^w,u* is nonzero in the dual solution above, so 
by complementary slackness we have = Xu»v + Xu*w Furthermore, since each 7r„ > 0, each v u* must 
have total error weight equal to M, again by complementary slackness. Therefore, for each v ^ u*, we have 

ibf ^ ^ Xy^uu ^ ^ (1 Xyyj') X y* y “t" ^ ^ (l (^Xy*y “t" Xy*yj)) 

wGJV'^(v) wGJV~(v) yj^{v,u*} 

- l) ^ ^ ^u*w- 

w^u* 


This implies that = Xu^w for all v ^ w. Letting p denote this common value, we have M = (n — 1) — 
(n — 3)p — np = {n — 1) — {2n — 3)p. On the other hand, since tt^* > 0, we also have 

M = ^ Xu-w = np. 

w^N'^{u*) 

Thus, (n — 1) — (2n — 3)p = np, which implies that p = 1/3. Hence, in any optimal solution we have 
Xu*v = 1/3 for all v ^ u* and Xyyj = 2/3 for all vw G E{Gn — u*), as desired. □ 

Thus, the only optimal solution to the natural LP rounding is highly symmetric, but the natural sym¬ 
metric clusterings of Gn - into either all singletons or into one giant cluster - both have at most 1 correct 
edge at the worst vertex, which is far short of the optimum value of [n/2j correct edges. We note that this 
does not pose a problem for the MinDisagree formulation: in a c-approximation for MinDisagree, we 
only promise that the generated clustering has at most c[n/2] errors at its worst vertex, and if c > 2, then 
any clustering at all meets this guarantee. 

C NP-Completeness of Minimax Clustering on Complete Graphs 

To show that minimax clustering is NP-hard on complete graphs, we use a reduction from the Partition- 
into-Triangles problem, originally stated in m and attributed to Schaefer. 

Partition into Triangles 

Input: A graph G with |P(G)| = 3g for some integer q. 

Question: Is there a partition of V{G) into q sets Vi,... ,Vq such that each set Vi induces a 
triangle in G? 

Specifically, we reduce from the 4-regular case: 

Theorem 10 (vanRooij, van Kooten Niekerk, Bodlaender |23|1. Partition into Triangles on 4,-regular graphs 
is NP-complete. 

(Although this is not explicitly stated in [53], it follows immediately from two of their results: that 
the problem is NP-hard on graphs of maximum degree at most 4, and that every partition-into-triangles 
instance with maximum degree at most 4 can be transformed in polynomial time into an equivalent 4-regular 
instance.) 

To prove that minimax clustering is NP-hard, we use the following reformulation, which is more convenient 
for our purposes. 

t-Perfect Clustering 

Input: A labeled complete graph G together with a tolerance ty G Z+ for each v G V{G). 

Question: Does G admit a t-perfect clustering, that is, a clustering such that each vertex v has 
at most ty incident mistakes? 
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Taking Xy = 1/ty, we see that G has a t-perfect clustering if and only if the minimax-clustering value of the 
resulting weighted graph is at most 1. 

Our NP-completeness proof mimics the proof given by Bansal, Blum, and Chawla for the classical corre¬ 
lation clustering problem. Let G be a 4-regular graph on n vertices, where n > 7, and let G' be the labeled 
complete graph on the same vertex set whose positive edges are exactly the edges of G. Observe that G has 
a partition into triangles if and only if G' has a clustering with all clusters of size at most 3 and exactly 2 
mistakes at each vertex. The idea is to expand G' into a larger labeled complete graph H such that in an 
optimal clustering of H, every cluster has at most three G'-vertices. 

We use essentially the same construction as Bansal-Blum-Chawla. Let H consist of G', augmented as 
follows. For every 3-set {u,'u,w} C V{G'), add to H a clique Guvw with 7 vertices. All edges within Guvw 
are positive, all edges from Guvw to the vertices {u, v, ic} are positive, and all other edges incident to Guvw 
are negative. 

We assign the following tolerances: each original vertex u € G' has ty = ^(("^^) — 1) -I- 2, and each added 
vertex v € H — G' has ty = 3. 


Lemma 11. If H has a t-perfect clustering C, then every cluster of C contains at most three vertices of G', 
and every cluster of C contains vertices from at most exactly one clique of H — G'. 


Proof. First suppose that C has a cluster X containing vertices from two different cliques of H — G'. Let 
vi,V 2 belong to the cliques Gi,G 2 respectively. If |A n Gi| > 3, then V 2 has more than 3 incident mistakes, 
which exceeds its tolerance. On the other hand, if |A O Gi| < 3, then since |Gi| = 7, we have |Gi — A| > 4, 
so vi has at least 4 incident mistakes, which again exceeds its tolerance. Thus, if C is t-perfect, then every 
cluster contains vertices from at most one clique. 

Now suppose that C has a cluster X that does not contain vertices from any clique of H — G'. Since 
clusters are nonempty, X contains a vertex v G V{G'). Since v has neighbors in V{H — G') and 

is not clustered with any of them, v has at least incident mistakes, which exceeds its tolerance of 


7 (’‘ 0 - 5 - 


Finally, suppose that C has some cluster X with at least four G'-vertices. Since X contains vertices from 
at most one clique of 77 — G', there is some vertex v G V{G') n X does not have any positive neighbors in 
X n V{H — G'). Since v has a total of positive neighbors in H — G', it again follows that v has at 

least 7(^2 incident mistakes, exceeding its tolerance. □ 


Corollary 12. H has a t-perfect clustering if and only if G has a partition into triangles. 


Proof. First suppose that Vi,..., 14 is a partition of G into triangles. Cluster H as follows: for i G [k], let 
Xi = Vi L) Gvi, where Gvi is the clque of H with vertex set Vi. For every clique G that is not equal to some 
Vi, cluster G on its own. 

Each V G V{G') has exactly 7{(^~^) — 1) + 2 mistakes: among the 7("'2^) postive edges to vertices of 
H — G', it is clustered with exactly 7 of them, and among its 4 positive neighbors in G, it is clustered 
with exactly 2 of them (and with no negative neighbors), since Vi,..., 14 is a partition of G into triangles. 
Furthermore, each v G V{H — G') has at most 3 mistakes, since this clustering has no mistakes within H — G' 
and does not cluster any w G V{Cxyz) with a vertex outside of {x, y, z}. Thus, the clustering is t-perfect. 

Now suppose that H has a t-perfect clustering C. By Lemma [Tl] every cluster of C contains at most three 
vertices of G and contains vertices from exactly one cluster Guvw of V (77 — G'). We claim that the restriction 
of C to F(G') is a partition of G into triangles. If not, some vertex v G F(G') is clustered with fewer than 
2 of its positive neighbors, and therefore has at least 3 incident mistakes in G'. Since the cluster containing 
V contains vertices from only one of the cliques containing v, we see that v also has at least 7 — 1) 
incident mistakes to vertices of V{H' — G), for at total of at least 7(("2 — I) + 3 incident mistakes. This 

exceeds its tolerance, contradicting the hypothesis that C is t-perfect. □ 


D NP-Completeness on Complete Bipartite Graphs 

In this section, we show that “one-sided” minimax clustering on complete bipartite graphs is NP-hard. This 
complements the approximation algorithm given in Section 0] for the same problem. Our proof is similar 
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to the proof of Amit [?] which shows that biclustering with the classical objective function is NP-hard, but 
requires significant modifications to accomodate the new objective function. The proof uses a reduction from 
the 3-cover problem, which is well-known to be NP-complete m- 

3-Cover 

Input: A ground set U = {ui,..., M 3 „} and a family of subsets S = {S'!,..., Sp} with each 
|5,| = 3. 

Question: Is there a subfamily S' C S such that each Ui lies in exactly one element of S'? 

Given an instance of 3-cover, we construct an instance of the following problem: 

One-Sided t-perfect Biclustering 

Input: A labeled complete bipartite graph G with partite sets Vi,V 2 and a tolerance ty G Z+ 
for each u G Vi. 

Question: Does G have a clustering such that each vertex u G Vi has at most ty incident edges 
that are errors? 

By the same argument used in Appendix [Cl any algorithm which exactly determines the optimal one-sided 
minimax clustering for complete bipartite graphs would also solve the t-perfect biclustering problem. Hence, 
it suffices to show that t-perfect biclustering is NP-hard. Note also that one-sided minimax clustering can 
be viewed as the special case of (two-sided) minimax clustering for which ty = \Vi \ for all v G ¥ 2 ', thus, the 
reduction in this section also shows that the two-sided version of the problem is NP-hard. 

Given a nontrivial instance of 3-cover (that is, an instance with n,p > 1), we construct an instance of 
t-perfect biclustering as follows. For each m G U, construct a pair of vertices Xi G Vi, yi G ¥ 2 - Gall these 
vertices ground vertices. Each edge XiPj is positive if Ui = uj or if Ui and Uj lie in some common triplet of 
S, and negative otherwise. 

For each Si G S, we create a vertex x(Si) G ¥1 and m vertices yi{Si),... ,ymiSi) G ¥ 2 , where each 
Xj{Si) G ¥i and yj{Si) G ¥ 2 , where m > 6 n -|- 3p is some fixed constant. Gall these vertices triplet vertices, 
and let Bi = {a;(S'i)} U {yj{Si): j G {1, ■ ■ ■ ,m}}- All edges x{Si)yk{Si) for a fixed i are positive, and all 
edges x{Si)yk{Sf) for z 7 ? £ are negative. For Ui G U, if Ui G Sj, then the edges Xiyk{Sj) and yix{Sj) are 
positive, and otherwise these edges are negative. 

Finally, let Z = {zi ,..., Zsn} be new E 2 -vertices, and for each Zi G Z, add positive edges to all ground- 
vertices in Vi and negative edges to all triplet-vertices in Vi. Call these vertices dummy vertices. 

Next we determine the tolerances ty. For Si G S, let G(Si) = 3. For m G U, the corresponding tolerances 
are computed more intricately. Let d{ui) be the number of triplets Sj G S containing Ui and let c{ui) be the 
number of Uj GU — {mz} such that uj and Ui lie in some common triplet Sj. We define 

txi = m{d{ui) - 1) -I- (c(iii) - 2) -I- {\Z\ - 3). 

It is clear that G and t can be constructed in polynomial time. 

Lemma 13. Suppose that G has a t-perfect clustering C. For any Si,Sj G S with i ^ j, the vertices x{Si) 
and x{Sj) lie in different clusters. 

Proof. Suppose that x{Si) and x{Sj) lie in the same cluster X. Since G(Si) = 3, we see that X contains at 
least TO — 3 vertices from yi{Si),..., ym{Si). Since x{Sj) has negative edges to all these vertices, it follows 
that x{Sj) has at least to — 3 incident errors. Since to — 3 > 3 = this contradicts the fact that C is 

t-perfect. □ 

Lemma 14. Suppose that G has a t-perfect clustering C. For any Uj G U, there is a unique Si G S such 
that Xj is clustered with x(Si). Furthermore, this Si has the following properties: 

1. Uj G Si, and 

2. Xj is clustered with each vertex yi such that ui G Si. 
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Proof. First we prove the existence of a unique Si such that Xj is clustered with x{Si), then we show that 
Si has the desired properties. 

If ykiSi) is a triplet V 2 -vertex not clustered with x(Si), call yk{Si) a rogue vertex. It is immediate from 
the definition of t that in a t-perfect clustering, each Bi contains at most 3 rogue vertices. 

To prove that xj is clustered with some x{Si), it suffices to show that xj is clustered with some triplet 
V 2 -vertex that is not a rogue vertex. Since each Bi contains at most 3 rogue vertices, there are at most 3p 
rogue vertices in total, where p = |iS|. If all triplet vertices clustered with Xj are rogue vertices, then since 
Xj has md(uj) positive edges to triplet vertices, it follows that Xj has at least md{uj) — 3p incident errors. 
Now we have 


txj = rn{d{uj) — 1) + {c{uj) — 2) + (\Z\ — 3) < md{uj) — m + 6n < md(uj) — 3p, 

where the last inequality follows from m > 6n + 3p. Thus, there are more than txj errors at Xj^ contra¬ 
dicting the assumption that C is t-perfect. Thus, Xj is clustered with some x{Si). Uniqueness of Si follows 
immediately from Lemma [T51 

To see that Uj £ Si, suppose that Uj ^ Si. Then Xj is clustered with at most 3 triplet-vertices that are 
its positive neighbors, and therefore has at least md{uj) — 3 incident errors. Since md(uj) — 3 > Ujj this 
contradicts the assumption that C is t-perfect. 

Next we prove (2). Let B = N~^{xj) — N~^{x{Si)). Since tx(Si) — the cluster containing Xj contains at 
most 3 vertices from B. Thus, there are at least \B\ — 3 errors from x to the vertices of B, where 

|i?| - 3 = \Z\ + m{d{uj) - 1) -f {c{uj) - 2) - 3 = tx^- 

Thus, for C to be t-perfect, it is necessary that all errors incident to Xj are edges from x to B. In particular, Xj 
is clustered with all vertices in N~^{xj) Ci N'^ {x{Si)), so that xj is clustered with all yg such that y^ £ Si. □ 

Corollary 15. G has a t-perfect clustering if and only if S' has a 3-cover. 

Proof. Given any t-perfect clustering, let S' be the family of triplets Si such that some vertex of Bi is 
clustered with some Ui-ground-vertex Xj. Lemma |14l immediately implies that these triplets cover all of u. 
Furthemore, Lemma [14] implies that these triplets are pairwise disjoint: if and S '2 are triplets of S' that 
both contain Uj, then Lemma m would force each and to both be clustered with yj and hence 

to be clustered together, which contradicts Lemma IT^ Hence, S' is a 3-cover. 

Conversely, let S' be a 3-cover in S. We define a clustering of G. Since S' is a 3-cover, we have |<S'| = n. 
Let Zs '^,..., Zgi^ be a partition of Z into n disjoint sets of size 3, indexed by the sets of S'. Now for each 
Si £ S, define a cluster Xi by 




Bi U \^Xj , yj : Uj £ U Zg ^, 

Bi, 


if S', e 5', 
otherwise. 


Since S' is a 3-cover, the clusters Xi are pairwise disjoint and cover the vertices of G. We claim that this 
clustering is t-perfect. If x{Si) is a triplet vertex corresponding to some Si ^ S' , then x{Si) has exactly 3 
incident errors, namely its edges to the ground-vertices yj with Uj £ Si. On the other hand, if x{Si) is a 
triplet vertex corresponding to some Si G S' , then x{Si) again has exactly 3 incident errors, namely its edges 
to the dummy-vertices in Zs.. 

If Xj (or yj) is a ground vertex, then Xj has m[d{uj) — 1) incident errors which are positive edges to 
triplet-vertices, c{uj) — 2 incident errors which are positive edges to ground-vertices, and \Z\ — 3 incident 
errors which are positive edges to dummy-vertices. This is a total of exactly tx^ incident errors. Hence the 
clustering is t-perfect. □ 


E Technical Details 

Lemma 16. Suppose a Type 2 cluster {it}UT has just been output in Algorithmic For any z £ S'—({m}UT), 
the total cluster-cost of the cross-edges for z is at most max{l/(l — 2a), 2/a} times the total LP-cost of the 
cross-edges for z. 
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Proof. This is essentially the same proof given by Charikar, Guruswami, and Wirth [nmn]; we repeat it here 
to keep the paper self-contained. If Xuz > 1 ~ a, then for each w G {u} U T, we have 

Xwz ^ Xuz Xuw 2o;. 

If there are p positive cross-edges, this implies that the total LP-cost of the cross-edges for z is at least 
(1 — 2a)p. Since the total cluster-cost of the cross-edges for z is p, the claim holds. 

Now consider Xuz G (a, 1 — a). Let P = n ({u} U T) and let Q = iV“(z) n ({u} U T); the total 

cluster-cost of the cross-edges for z is just |P|. We have the following lower bound on the total LP-cost of 
the cross-edges for z: 


^ ^ Xwz 


Xwz] ^ 

{Xuz - 

Xuw ) 


Xuz Xuw ) 

weP 

w^N 


wGP 




weQ 




= 

\P\Xuz 

+ 

IQI 

(1- 

~ Xuz] 

^ ^ Xuw 









wG{u}UT 




\P\Xuz 


IQI 

(1- 

~ Xuz] 

a(|P| + IQI) 



> 

+ 
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where in the last line we used the inequality X]u;g{u}uT This lower bound is linear in Xuz, so 

we study its behavior at the endpoints of {a, 1 — a). When Xuz = a, the lower bound rearranges as follows: 

«|P| + (1_„)|Q|_^1L^±M = ||P| + (1_^)|Q|>||P|. 

When Xuz = 1 — 0 ;, the lower bound rearranges as follows: 

(1 - o) |P| + o IQI - + IQI) , (1 - ^) |P| + 1101 > I |P|. 

In both cases, we used the assumption a < 1/2, which implies 1 — ^ It follows that charging ^ times 

the LP-cost of each cross-edge yields enough charge to pay for the cluster-cost of all cross-edges. □ 

Lemma 17. Suppose that a Type 2 cluster C has just been output in Algorithm\^ For any vertex z G Vi — C, 
the total cluster-cost of the cross-edges for z is at most max{l/(l — 2a), 2/a} times the total LP-cost of the 
cross-edges for z. 

Proof. We essentially repeat the proof of Lemma ITbl If Xuz > 1 — a, then for each w G {uj U T, we have 

^WZ ^ ^UZ ^UW ^1 2a. 


If there are p positive cross-edges, this implies that the total LP-cost of the cross-edges for z is at least 
(1 — 2a)p. Since the total cluster-cost of the cross-edges for z is p, the claim holds. 

Now consider Xuz G (a, 1 — a). Let P = N~^{z) n ({u} U T) and let Q = iV“(z) n ({uj U T); the total 
cluster-cost of the cross-edges for z is just |P|. Note that P U Q = V 2 n T. We have the following lower 
bound on the total LP-cost of the cross-edges for z: 

^ P ^ ^ (1 Xuz ^uw^ 
w£Q 

^uz^ ^ ^ ^uw 

w^V^nT 

-xuz)-lm + \Q\), 

^ |{'«}UT|. This lower bound is linear in 
Xuz, so we study its behavior at the endpoints of (a, 1 — a). When Xuz = a, the lower bound rearranges as 
follows: 

a |P| + (1 - a) IQI - -(|P| + IQI) = - |P| + (1 - -) |g| > - |P|. 


^ Xu^zF ^ i^Xuz Xuw] 

w£P w£N wGP 

= \P\Xuz + \Q\{l 

> \P\Xuz + IQI (1 

where in the last line we used the inequality < 
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When Xuz = 1 — a, the lower bound rearranges as follows: 

(1 - a) |P| + a IQI - |(|F| + |g|) > (1 - « - |) |P| + | |g| > | |P|. 

In both cases, we used the assumption a < 1/2. It follows that when Xuz G (a, 1 — a), charging l/(a — /3) 
times the LP-cost of each cross-edge yields enough charge to pay for the cluster-cost of all cross-edges. □ 
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